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ABSTRACT 

We describe an algorithm for identifying point-source transients and moving objects on 
reference-subtracted optical images containing artifacts of processing and instrumentation. The 
algorithm makes use of the supervised machine learning technique known as Random Forest. We 
present results from its use in the Dark Energy Survey Supernova program (DES-SN), where 
it was trained using a sample of 898,963 signal and background events generated by the tran¬ 
sient detection pipeline. After reprocessing the data collected during the first DES-SN observing 
season (Sep. 2013 through Feb. 2014) using the algorithm, the number of transient candidates 
eligible for human scanning decreased by a factor of 13.4, while only 1.0 percent of the arti¬ 
ficial Type la supernovae (SNe) injected into search images to monitor survey efficiency were 
lost, most of which were very faint events. Here we characterize the algorithm’s performance 
in detail, and we discuss how it can inform pipeline design decisions for future time-domain 
imaging surveys, such as the Large Synoptic Survey Telescope and the Zwicky Transient Facility. 
An implementation of the algorithm and the training data used in this paper are available at 
http: / / portal.nersc.gov / project / dessn / autoscan. 

Subject headings: transients - discovery, algorithms - statistical, random forest, machine learning. 
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To identify scientifically valuable transients or 
moving objects on the sky, imaging surveys have 
historically adopted a manual approach, employ¬ 
ing humans to visually inspect images for signa- 


tures of the events (e.g., Zwicky|l964| Hamuy et al. 

1993) Perlmutter et al. 1997 Schmidt et al. 

1998) 

Filippenko et al. 2001 Strolger et al. 2004 

Blanc 

et al. 2004| Astier et al. 2006 Sako et al. 

2008| 

Mainzer et al. 2011 Waszczak et al. 2013 

Rest 

et al. 2014). But recent advances in the capabil- 


ities of telescopes, detectors, and supercomputers 
have fueled a dramatic rise in the data produc¬ 
tion rates of such surveys, straining the ability of 
their teams to quickly and comprehensively look 
at images to perform discovery. 


For surveys that search for objects on differ¬ 
ence images—CCD images that reveal changes in 
the appearance of a region of the sky between two 
points in time—this problem of data volume is 
compounded by the problem of data purity. Dif¬ 
ference images are produced by subtracting refer¬ 
ence images from single-epoch images in a process 
that involves point-spread function (PSF) match¬ 


ing and image distortion (see, e.g., Alard & Lup- 


ton 1998). In addition to legitimate detections 


of astrophysical variability, they can contain arti¬ 
facts of the differencing process, such as poorly 
subtracted galaxies, and artifacts of the single¬ 
epoch images, such as cosmic rays, optical ghosts, 
star halos, defective pixels, near-field objects, and 
CCD edge effects. Some examples are presented 
in Figure [l] These artifacts can vastly outnumber 
the signatures of scientifically valuable sources on 
the images, forcing object detection thresholds to 
be considerably higher than what is to be expected 
from Gaussian fluctuations. 


For time-domain imaging surveys with a spec¬ 
troscopic follow-up program, these issues of data 
volume and purity are compounded by time- 
pressure to produce lists of the most promising 
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Fig. 1.— Cutouts of DES difference images, roughly 14 arcsec on a side, centered on legitimate (green 
boxes; left four columns of figure) and spurious (red boxes; right four columns of figure) objects, at a 
variety of signal-to-noise ratios: (a) S/N < 10, (b) 10 < S/N < 30, (c) 30 < S/N < 100. The cutouts are 
subclassed to illustrate both the visual diversity of spurious objects and the homogeneity of authentic ones. 
Objects in the “Transient” columns are real astrophysical transients that subtracted cleanly. Objects in the 
“Fake SN” columns are fake SNe la injected into transient search images to monitor survey efficiency. The 
column labeled “CR/Bad Column” shows detections of cosmic rays (rows b and c) and a bad column on the 
CCD detector (row a). The columns labeled “Bad Sub” show non-varying astrophysical sources that did 
not subtract cleanly; this can result from poor astrometric solutions, shallow templates, or bad observing 
conditions. The numbers at the bottom of each cutout indicate the score that each detection received from 
the machine learning algorithm introduced in ^3] a score of 1.0 indicates the algorithm is perfectly confident 
that the detection is not an artifact, while a score of 0.0 indicates the opposite. 
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targets for follow-up observations before they be¬ 
come too faint to observe or fall outside a window 
of scientific utility. Ongoing searches for Type la 
supernovae (SNe la) out to 2 : ^ 1, such as the 
Panoramic Survey Telescope and Rapid Response 
System Medium Deep Survey’s ( Rest et ah]|2014 ) 


and the Dark Energy Survey’s (DES; Flaugher 


2005), face all three of these challenges. The DES 
supernova program (DES-SN; [Bernstein et al. 


2012), for example, produces up to 170 gigabytes 
of raw imaging data on a nightly basis. Visual 
examination of sources extracted from the result¬ 
ing difference images using SExtractor QBertin fc 


Arnouts 1996) revealed that ~93 percent are arti¬ 


facts, even after selection cuts (Kessler et al. 2015, 
in preparation). Additionally, the survey has a 
science-critical spectroscopic follow-up program 
for which it must routinely select the ~10 most 
promising transient candidates from hundreds of 
possibilities, most of which are artifacts. This pro¬ 
gram is crucial to survey science as it allows DES 
to confirm transient candidates as SNe, train and 
optimize its photometric SN typing algorithms 
(e.g., PSNID; 


Sako et al. 2011 


NNN; 


Karpenka, 


Feroz, & Hobson 2013), and investigate interest¬ 


ing non-SN transients. To prepare a list of objects 
eligible for consideration for spectroscopic follow¬ 
up observations, members of DES-SN scanned 
nearly 1 million objects extracted from difference 
images during the survey’s first observing season, 
the numerical equivalent of nearly a week of un¬ 
interrupted scanning time, assuming scanning one 
object takes half a second. 

For DES to meet its discovery goals, more 
efficient techniques for artifact rejection on dif¬ 
ference images are needed. Efforts to “crowd- 
source” similar large-scale classification problems 
have been successful at scaling with growing data 
rates; websites such as Zooniverse. org have ac¬ 
cumulated over one million users to tackle a va¬ 
riety of astrophysical classification problems, in¬ 
cluding the classification of transient candidates 
from the Palomar Transient Factory (PTF; [Smith 


et al.]|2011). However, for DES to optimize classi¬ 


fication accuracy and generate reproducible clas¬ 
sification decisions, automated techniques are re¬ 
quired. 

To reduce the number of spurious candidates 
considered for spectroscopic follow-up, many sur¬ 
veys impose selection requirements on quantities 


that can be directly and automatically computed 
from the raw imaging data. Making hard selection 
cuts of this kind has been shown to be a subop- 
timal technique for artifact rejection in difference 
imaging. Although such cuts are automatic and 
easy to interpret, they do not naturally handle cor¬ 
relations between features, and they are an ineffi¬ 
cient way to select a subset of the high-dimensional 
feature space as the number of dimensions grows 
large (B ailey et al.| |2007). 

In contrast to selection cuts, machine learning 
(ML) classification techniques provide a flexible 
solution to the problem of artifact rejection in 
difference imaging. In general, these techniques 
attempt to infer a precise mapping between nu¬ 
meric features that describe characteristics of ob¬ 
served data, and the classes or labels assigned to 
those data, using a training set of feature-class 
pairs. ML classification algorithms that gener¬ 
ate decision rules using labeled data—data whose 
class membership has already been definitively 
established—are called “supervised” algorithms. 
After generating a decision rule, supervised ML 
classifiers can be used to predict the classes of un¬ 
labeled data instances. For a review of supervised 
ML classification in astronomy, see, e.g. Ivezic 


et al. (2013). For an introduction to the statis¬ 


tical underpinnings of supervised ML classifica¬ 
tion techniques, see |Willsky, Wornell, & Shapiro] 
(2003). 


Such classifiers address many of the shortcom¬ 
ings of scanning and selection cuts. ML algo¬ 
rithms’ decisions are automatic, reproducible, and 
fast enough to process streaming data in real-time. 
Their biases can be systematically and quantita¬ 
tively studied, and, most importantly, given ade¬ 
quate computing resources, they remain fast and 
consistent in the face of increasing data produc¬ 
tion rates. As more data are collected, ML meth¬ 
ods can continue to refine their knowledge about 
a data set (see £5.1), thereby improving their 
predictive performance on future data. Super¬ 
vised ML classification techniques are currently 
used in a variety of astronomical contexts, in¬ 
cluding time-series analysis, such as the classifi¬ 
cation of variable stars (Richards et al. 2011) and 


SNe ( Karpenka, Feroz, fc Hobson|2013 ) from light 
curves, and image analysis, such as the typing 
of galaxies (Banerji et al. |2010 ), and discovery 
of trans-Neptunian objects (Gerdes et al. 2015, 
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in preparation) on images. Although their input 
data types differ, light curve shape and image- 
based ML classification frameworks are quite sim¬ 
ilar: both operate on tabular numeric classifica¬ 
tion features computed from raw input data (see 
p4L2l). 


The use of supervised machine learning classi¬ 
fication techniques for artifact rejection in differ¬ 


ence imaging was pioneered by Bailey et al. (2007) 


for the Nearby Supernova Factory (Aldering et 
ah||2002 ) using imaging data from the Near-Earth 


Asteroid Tracking program^ and the Palomar- 
QUEST Consortium, using the 112-CCD QUEST- 
II camera (Bal tay et al.||2007| ). They compared 
the performance of three supervised classification 
techniques—a Support Vector Machine, a Ran¬ 
dom Forest, and an ensemble of boosted decision 
trees—in separating a combination of real and fake 
detections of SNe from background events. They 
found that boosted decision trees constructed from 
a library of astrophysical domain features (magni¬ 
tude, FWHM, distance to the nearest object in 
the reference co-add, measures of roundness, etc.) 
provided the best overall performance. 


Bloom et al. (2012) built on the methodology 


of Bailey et al. (2007 ^ by developing a highly ac¬ 
curate Random Forest framework for classifying 
detections of variability extracted from PTF dif¬ 


ference images. Brink et al. (2013) made improve¬ 


ments to the classifier of Bloom et al. (2012), set¬ 


ting an unbroken benchmark for best overall per¬ 
formance on the PTF data set, using the tech¬ 
nique of recursive feature elimination to optimize 


their classifier. Recently, du Buisson et al. (2014) 


published a systematic comparison of several clas¬ 
sification algorithms using features based on Prin¬ 
cipal Component Analysis (PCA) extracted from 
Sloan Digital Sky Survey-II SN survey difference 
images. Finally, Wright et al. ( |2015 ) used a pixel- 
based approach to engineer a Random Forest clas¬ 
sifier for the Pan-STARRS Medium Deep Survey. 

In this article, we describe autoScan, a com¬ 
puter program developed for this purpose in DES- 
SN. Our main objective is to report the method¬ 
ology that DES-SN adopted to construct an ef¬ 
fective supervised classifier, with an eye toward 
informing the design of similar frameworks for fu¬ 
ture time domain surveys such as the Large Syn- 


1 http: //neat. jpi. nasa. gov 


optic Survey Telescope (LSST;[LSST Science Col¬ 


laboration 2009) and the Zwicky Transient Facility 
(ZTF; |Smith et ah]|2014[ ) . We extend the work of 
previous authors to a newer, larger data set, show¬ 
ing how greater selection efficiency can be achieved 
by increasing training set size, using generative 
models for training data, and implementing new 
classification features. 


The structure of the paper is as follows. In ^2| 
we provide an overview of DES and the DES-SN 
transient detection pipeline. In ^3j we describe 
the development of autoScan. In Q we present 
metrics for evaluating the code’s performance and 
review its performance on a realistic classification 
task. In ^5j we discuss lessons learned and areas 
of future development that can inform the design 
of similar frameworks for future surveys. 


2. The Dark Energy Survey and Transient 
Detection Pipeline 

In this section, we introduce DES and the 
DES-SN transient detection pipeline (“DiffImg”; 
Kessler et al. 2015, in preparation), which 
produced the data used to train and validate 
autoScan. DES is a Stage III ground-based dark 
energy experiment designed to provide the tightest 
constraints to date on the dark energy equation 
of state parameter using observations of the four 
most powerful probes of dark energy suggested 


by the Dark Energy Task Force (DETF; Albrecht 
et al. 2006): SNe la, galaxy clusters, baryon acous¬ 


tic oscillations, and weak gravitational lensing. 
DES consists of two interleaved imaging surveys: 
a wide-area survey that covers 5,000 deg 2 of the 
south Galactic cap in 5 filters ( grizY ), and DES- 
SN, a time-domain transient survey that covers 
10 (8 “shallow” and 2 “deep”) 3 deg 2 fields in 
the XMM-LSS, ELAIS-S, CDFS, and Stripe-82 
regions of the sky, in four filters ( griz ). The sur¬ 
vey’s main instrument, the Dark Energy Cam¬ 
era (DECam; Diehl et al.| 2012| |Flaugher et al 
2012| Flaugher et al. 2015, submitted), is a 570- 
megapixel 3 deg 2 imager with 62 fully depleted, 
red-sensitive CCDs. It is mounted at the prime 
focus of the Victor M. Blanco 4m telescope at the 
Cerro Tololo Inter-American Observatory (CTIO). 
DES conducted “science verification” (SV) com¬ 
missioning observations from November 2012 until 
February 2013, and it began science operations in 
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August 2013 that will continue until at least 2018 
(Diehl et al. 2014). The data used in this article 


are from the first season of DES science operations 
(“YF; Aug. 2013—Feb. 2014). 

A schematic of the pipeline that DES-SN em¬ 
ploys to discover transients is presented in Figure 
[2] Transient survey “science images” are single¬ 
epoch CCD images from the DES-SN fields. After 
the image subtraction step, sources are extracted 
using SExtractor. Sources that pass the cuts de¬ 
scribed in the Object section of Table [T| are re¬ 
ferred to as “detections.” A “raw candidate” is de¬ 
fined when two or more detections match to within 
1”. A raw candidate is promoted to a “science can¬ 
didate” when it passes the NUMEP0CHS requirement 
in Table [l] This selection requirement was im¬ 
posed to reject Solar System objects, such as main 
belt asteroids and Kuiper belt objects, which move 
substantially on images from night to night. Sci¬ 
ence candidates are eligible for visual examination 
and spectroscopic follow-up observations. During 
the observing season, science candidates are rou¬ 
tinely photometered, fit with multi-band SN light 
curve models, visually inspected, and slated for 
spectroscopic follow-up. 


3. Classifier Development 

In this section, we describe the development of 
autoScan. We present the classifier’s training data 
set (j |3.1| ), its classification feature set (j |3.2| ), and 
the selection (43.3), properties (j ]3.4| ), and opti¬ 
mization (j]3.5|) of its core classification algorithm. 


3.1. Training Data 

To make probabilistic statements about the 
class membership of new data, supervised ML 
classifiers must be trained or fit to existing data 
whose true class labels are already known. Each 
data instance is described by numeric classification 
“features” (see £3.2.2); an effective training data 


set must approximate the joint feature distribu¬ 
tions of all classes considered. Objects extracted 
from difference images can belong to one of two 
classes: “Artifacts,” or “Non-Artifacts.” Examples 
of each class must be present in the training set. 
Failing to include data from certain regions of fea¬ 
ture space can corrode the predictive performance 
of the classifier in those regions, introducing bias 
into the search that can systematically degrade 


survey efficiency (Richards et al. 2012). Because 
the training set compilation described here took 
place during the beginning of Yl, it was compli¬ 
cated by a lack of available visually scanned “non¬ 
artifact” sources. 


Fortunately, labeling data does not necessarily 
require humans to visually inspect images. |Bloom| 


et al. (2012) discuss a variety of methods for label¬ 


ing detections of variability produced by difference 
imaging pipelines, including scanning alternatives 
such as artificial source construction and spectro¬ 
scopic follow-up. Scanning, spectroscopy, and us¬ 
ing fake data each have their respective merits and 
drawbacks. Scanning is laborious and potentially 
inaccurate, especially if each data instance is only 
examined by one scanner, or if scanners are not 
well trained. However, a large group of scanners 
can quickly label a number of detections sufficient 
to create a training set for a machine classifier, and 


Brink et al. (2013) have shown that the supervised 


classification algorithm Random Forest, which was 
ultimately selected for autoScan, is insensitive to 
mislabeled training data up to a contamination 
level of 10 percent. 

Photometric typing (e.g., Sako et al. 2on]) can 
also be useful for labeling detections of transients. 
However, robust photometric typing requires well- 
sampled light curves, which in turn require high- 
cadence photometry of difference image objects 
over timescales of weeks or months. This require¬ 
ment is prohibitive for imaging surveys in their 
early stages. Further, because photometric typing 
is an integral part of the spectroscopic target se¬ 
lection process, by extension new imaging surveys 
also have too few detections of spectroscopically 
confirmed SNe, AGN, or variable stars. Native 
spectroscopic training samples are therefore im¬ 
practical sources of training data for new surveys. 


Artificial source construction is the fastest 
method for generating native detections of non¬ 
artifact sources in the early stages of a survey. 
Large numbers of artificial transients (“fakes”) can 
be injected into survey science images, and by con¬ 
struction their associated detections are true pos¬ 
itives. Difficulties can arise when the joint feature 
distributions of fakes selected for the training set 
do not approximate the joint feature distributions 
of observed transients in production. In DES-SN, 
SN la fluxes from fake SN la light curves are over¬ 
laid on images near real galaxies. The fake SN la 
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Fig. 2.— Schematic of the DES-SN transient detection pipeline. The magnitudes of fake SNe la used to 
monitor survey efficiency are calibrated using the zero point of the images into which they are injected and 
generated according to the procedure described in £3.1 The autoScan step (red box) occurs after selection 
cuts are applied to objects extracted from difference images and before objects are spatially associated into 
raw transient candidates. Codes used at specific steps are indicated in parenthesis. 
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Table 1 

DES-SN OBJECT AND CANDIDATE SELECTION REQUIREMENTS. 


Set 

Feature 

Lower Limit 

Upper Limit 

Description 

Object 

MAG 


30.0 

Magnitude from SExtractor. 


A.IMAGE 


1.5 pix. 

Length of semi-major axis from 

SExtractor. 


SPREAD_M0DEL 


3a s + 1.0 

Star-galaxy separation output parameter 
from SExtractor. as is the estimated 
SPREAD_M0DEL uncertainty. 


CHISQ 


10 4 

X 2 from PSF-fit to 35 x 35 pixel cutout 
around object in difference image. 


SNR 

3.5 


Flux from a PSF-model fit to a 35 x 35 pixel 
cutout around the object divided by the un¬ 
certainty from the fit. 


VET0MAG a 

21.0 


Magnitude from SExtractor for use in veto 
catalog check. 


VET0T0L a 

Magnitude- 

dependent 


Separation from nearest object in veto cat¬ 
alog of bright stars. 


DIP0LE6 


2 

N p i x in 35 x 35 pixel object-centered cutout 
at least 6a below 0. 


DIP0LE4 


20 

N p i x i n 35 x 35 pixel object-centered cutout 
at least 4a below 0. 


DIP0LE2 


200 

Np^ i n 35 x 35 pixel object-centered cutout 
at least 2a below 0. 

Candidate 

NUMEPOCHS 

2 


Number of distinct nights that the candidate 
is detected. 


a The difference imaging pipeline is expected to produce false positives near bright or variable stars, thus 
all difference image objects are checked against a “veto” catalog of known bright and variable stars and are 
rejected if they are brighter than 21st magnitude and within a magnitude-dependent radius of a veto catalog 
source. Thus only one of VETOMAG and VETOTOL must be satisfied for an object to be selected. 
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light curves are generated by the SNANA simulation 
( Kessler et al.|2009 ), and they include true parent 
populations of stretch and color, a realistic model 
of intrinsic scatter, a redshift range from 0.1 to 
1.4, and a galaxy location proportional to surface 
brightness. On difference images, detections of 
overlaid fakes are visually indistinguishable from 
real point-source transients and Solar System ob¬ 
jects moving slowly enough not to streak. Ah fake 
SN la light curves are generated and stored prior 
to the start of the survey. The overlay procedure 
is part of the difference imaging pipeline, where 
the SN la flux added to the image is scaled by 
the zero point, spread over nearby pixels using 
a model of the PSF, and fluctuated by random 
Poisson noise. These fakes are used to monitor 
the single-epoch transient detection efficiency, as 
well as the candidate efficiency in which detections 
on two distinct nights are required. On average, 
six detections of fake SNe are overlaid on each 
single-epoch CCD-image. 

The final autoScan training set contained de¬ 
tections of visually scanned artifacts and artificial 
sources only. We did not include detections of pho¬ 
tometrically typed transients to minimize the con¬ 
tamination of the “Non-Artifact” class with false 
positives. Bailey et al. (2007) also used a train¬ 


ing set in which the “Non-Artifact” class consisted 
largely of artificial sources. 


With 898,963 training instances in total, the 
autoScan training set is the largest used for differ¬ 
ence image artifact rejection in production. It was 
split roughly evenly between “real” and “artifact” 
labeled instances—454,092 were simulated SNe la 
injected onto host galaxies, while the remaining 
444,871 detections were human-scanned artifacts. 
Compiling a set of artifacts to train autoScan was 
accomplished by taking a random sample of the 
objects that had been scanned as artifacts by hu¬ 
mans during an early processing of DES Y1 data 
with a pared-down version of the difference imag¬ 
ing pipeline presented in Figure [2j 


3.2. Features and Processing 

The supervised learning algorithms we consider 
in this analysis are nonlinear functions that map 
points representing individual detections in fea¬ 
ture space to points in a space of object classes 
or class probabilities. The second design choice in 
developing autoScan is therefore to define a suit¬ 


able feature space in which to represent the data 
instances we wish to use for training, validation, 
and prediction. In this section, we describe the 
classification features that we computed from the 
raw output of the difference imaging pipeline, as 
well as the steps used to pre- and post-process 
these features. 


3.2.1. Data Preprocessing 

The primary data sources for autoScan fea¬ 
tures are 51x51 pixel object-centered search, tem¬ 
plate, and difference image cutouts. The template 
and difference image cutouts are sky-subtracted. 
The search image cutout is sky-subtracted if and 
only if it does not originate from a coadded ex¬ 
posure, though this is irrelevant for what follows 
as no features are directly computed from search 
image pixel values. Photometric measurements, 
SExtractor output parameters, and other data 
sources are also used. Each cutout associated with 
a detection is compressed to 25 x 25 pixels. The 
seeing for each search image is usually no less than 
1 arcsec, while the DECam pixel scale lies be¬ 
tween 0.262 and 0.264 arcsec depending on the 
location on the focal plane, so little information is 
lost during compression. Although some artifacts 
are sharper than the seeing, we found that using 
compressed cutouts to compute some features re¬ 
sulted in better performance. 

Consider a search, template, or difference im¬ 
age cutout associated with a single detection. Let 
the matrix element 1 X:V of the 51 x 51 matrix I 
represent the flux-value of the pixel at location 
x,y on the cutout. We adopt the convention of 
zero-based indexing and the convention that ele¬ 
ment (0, 0) corresponds to the pixel at the top 
left-hand corner of the cutout. Let the matrix el¬ 
ement C XjV of the 25 x 25 matrix C represent the 
flux-value of the pixel at location x,y on the com¬ 
pressed cutout. Then C is defined element-wise 
from I via 


^x,y 


1 


1 1 

EE 1 2x+i,2y+j 5 

i =0 j=0 


(i) 


where N u is the number of unmasked pixels in the 
sum. Masked pixels are excluded from the sum. 
Only when ah four terms in the sum represent 
masked pixels is the corresponding pixel masked 
in C. Note that matrix elements from the right- 
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hand column and last row of I never appear in 
Equation [l] 

To ensure that the pixel flux-values across 
cutouts are comparable, we rescale the pixel values 
of each compressed cutout via 


D _ C x,y ~ med(C) ^ 

-K'X,y — /s 5 \^J 

(J 

where the matrix element R x , y of the 25 x 25 ma¬ 
trix R represents the flux-value of the pixel at loca¬ 
tion x, y on the compressed, rescaled cutout, and 
(j is a consistent estimator of the standard devia¬ 
tion of C. We take the median absolute deviation 
as a consistent estimator of the standard deviation 

, according to 

med(C')|) 

(!) (} 

where l/4> _1 (3/4) « 1.4826 is the reciprocal of 
the inverse cumulative distribution for the stan¬ 
dard normal distribution evaluated at 3/4. This 
is done to ensure that the effects of defective pix¬ 
els and cosmic rays nearly perpendicular to the 
focal plane are suppressed. We therefore have the 
following closed-form expression for the matrix el¬ 
ement R x , y , 


(Rousseeuw & Croux 1993) 
med(|C — 


R 


<x,y 


C x ,y - med(C) 


1.4826 Lmed(|C — med(C)|)_ 


( 4 ) 


The rescaling expresses the value of each pixel on 
the compressed cutout as the number of standard 
deviations above the median. Masked pixels are 
excluded from the computation of the median in 
Equation [4] 


Finally, an additional rescaling from Brink et al. 


(2013) is defined according to 


_ I x ,y ~ med(I) 
max(|I|) 


( 5 ) 


The size of B is 51 x 51. We found that using 
B instead of R or I to compute certain features 
resulted in better classifier performance. Masked 
pixels are excluded from the computation of the 
median in Equation [5] 


3.2.2. Feature Library 

Two feature libraries were investigated. The 
first was primarily “pixel-based.” For a given ob¬ 
ject, each matrix element of the rescaled, com¬ 
pressed search, template, and difference cutouts 


was used as a feature. The CCD ID number of 
each detection was also used, as DECam has 62 
CCDs with specific artifacts (such as bad columns 
and hot pixels) as well as effects that are repro¬ 
ducible on the same CCD depending on which field 
is observed (such as bright stars). The signal-to- 
noise ratio of each detection was also used as a fea¬ 
ture. The merits of this feature space include rel¬ 
atively straightforward implementation and com¬ 
putational efficiency. A production version of this 
pixel-based classifier was implemented in the DES- 
SN transient detection pipeline at the beginning of 
Yl. In production, it became apparent that the 
1,877-dimensionaQ feature space was dominated 
by uninformative features, and that better false 
positive control could be achieved with a more 
compact feature set. 

We pursued an alternative feature space go¬ 
ing forward, instead using 38 high-level metrics to 
characterize detections of variability. A subset of 
the features are based on analogs from |Bloom et al. 


(2012) and Brink et al. (2013). In this section, we 
describe the features that are new. We present 
an at-a-glance view of the entire autoScan fea¬ 
ture library in Table [2| Histograms and contours 
for the three most important features in the final 
autoScan model (see j |3.4| ) appear in Figure [4j 


2 625 pixels on a 25 x 25 pixel cutout x 3 cutouts per detec¬ 
tion + 2 non-pixel features (snr, ccdid) = 1,877. 
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Table 2 

autoScan’s feature library. 


Feature Name 

Importance 

Source 

Description 

r_aper_psf 

0.148 

New 

The average flux in a 5-pixel circular aperture centered on 
the object on the T cutout plus the flux from a 35 x 35-pixel 
PSF model-fit to the object on the I d cutout, all divided 
by the PSF model-fit flux. 

magdiff 

0.094 

B12 

If a source is found within 5” of the location of the object 
in the galaxy coadd catalog, the difference between mag and 
the magnitude of the nearby source. Else, the difference be¬ 
tween mag and the limiting magnitude of the parent image 
from which the I d cutout was generated. 

spreadunodel 

0.066 

New 

SPREAD_M0DEL output parameter from SExtractor on I d . 

n2sig5 

0.055 

B12 

Number of matrix elements in a 7 x 7 element block centered 
on the detection on R d with values less than -2. 

n3sig5 

0.053 

B12 

Number of matrix elements in a 7 x 7 element block centered 
on the detection on R d with values less than -3. 

n2sig3 

0.047 

B12 

Number of matrix elements in a 5 x 5 element block centered 
on the detection on R d with values less than -2. 

f lux_ratio 

0.037 

B12 

Ratio of the flux in a 5-pixel circular aperture centered on 
the location of the detection on I d to the absolute value of 
the flux in a 5-pixel circular at the same location on I t . 

n3sig3 

0.034 

B12 

Number of matrix elements in a 5 x 5 element block centered 
on the detection on R d with values less than -3. 

mag_ref _err 

0.030 

B12 

Uncertainty on mag_ref, if it exists. Else imputed. 

snr 

0.029 

B12 

The flux from a 35 x 35-pixel PSF model-fit to the object 
on I d divided by the uncertainty from the fit. 

colmeds 

0.028 

New 

The maximum of the median pixel values of each column 
on B d . 

nn_dist _renorm 

0.027 

B12 

The distance from the detection to the nearest source in the 
galaxy coadd catalog, if one exists within 5”. Else imputed. 

ellipticity 

0.027 

B12 

The ellipticity of the detection on I d using a_image and 
b_image from SExtractor. 

amp 

0.027 

B13 

Amplitude of fit that produced gauss. 

scale 

0.024 

B13 

Scale parameter of fit that produced gauss. 

b_image 

0.024 

B12 

Semi-minor axis of object from SExtractor on I d . 

mag_ref 

0.022 

B12 

The magnitude of the nearest source in the galaxy coadd 
catalog, if one exists within 5” of the detection on I d . Else 
imputed. 

diffsum 

0.021 

New 

The sum of the matrix elements in a 5 x 5 element box 
centered on the detection location on R d . 

mag 

0.020 

B12 

The magnitude of the object from SExtractor on I d . 

a_ref 

0.019 

B12 

Semi-major axis of the nearest source in the galaxy coadd 
catalog, if one exists within 5”. Else imputed. 
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3.2.3. New Features 


given by 


In this section we present new features devel¬ 
oped for autoScan. Let the superscripts s,£, and 
d on matrices defined in the previous section de¬ 
note search, template, and difference images, re¬ 
spectively. The feature r_aper_psf is designed 
to identify badly subtracted stars and galaxies 
on difference images caused by poor astrometric 
alignment between search and template images. 
These objects typically appear as overlapping cir¬ 
cular regions of positive and negative flux colloqui¬ 
ally known as “dipoles.” Examples are presented 
in Figure [3j In these cases the typical search- 
template astrometric misalignment scale is com¬ 
parable to the FWHM of the PSF, causing the 
contributions of the negative and positive regions 
to the total object-flux from a PSF-model fit to 
be approximately equal in magnitude but oppo¬ 
site in sign, usually with a slight positive excess as 
the PSF-fit is centered on the detection location, 
where the flux is always positive. The total flux 
from a PSF-model fit to a dipole is usually greater 
than but comparable to the average flux per pixel 
in a five-pixel circular aperture centered on the 
detection location on the template image. To this 
end, let F aper j be the flux from a five-pixel circu¬ 
lar aperture centered on the location of a detection 
on the uncompressed template image. Let Fpsfj 
be the flux computed by fitting a PSF-model to 
a 35 x 35 pixel cutout centered on the location of 
the detection on the uncompressed difference im¬ 
age. Then r_aper_psf is given by 


2 2 

diffsum = Ri c+i}yc+j , (7) 

i— 2 j— 2 


where x c ,y c is the location of the central element 
on R d . It gives a coarse measurement of the sig¬ 
nificance of the detection. 


bandmim is a numeric representation of the filter 
in which the object was detected on the search 
image. This feature enables autoScan to identify 
band-specific patterns. 


numneg is intended to assess object-smoothness 
by returning the number of negative elements in 
a 7 x 7 pixel box centered on the object in R d , 
exposing objects riddled with negative pixels or 
objects that have a significant number of pixels 
below med (R d ). Used in concert with the S/N, 
numneg can help identify high-S/N objects with 
spatial pixel intensity distributions that do not 
vary smoothly, useful in rejecting hot pixels and 
cosmic rays. 


1 acosmic was designed to identify cosmic rays 
and other objects with spatial pixel intensity dis¬ 
tributions that do not vary smoothly, and is based 
loosely on the methodology that |van Do kknm 
(2001) uses to identify cosmic rays on arbitrary sky 
survey images. Derive the “fine structure” image 
F from B d according to 


F = (M 3 * B d ) - ([M s * B d ] * M 7 ), (8) 


where M n is an n x n median filter. Then 


r_aper_psf 


Fa per, I + Fpsf,I 
Fpsf,i 


(6) 


We find that objects with r_aper_psf > 1.25 are 
almost entirely “dipoles.” 

Let a G {2,3}, b G {3,5}. The four features 
nasigfrshift represent the difference between the 
number of pixels with flux values greater than or 
equal to a in (6+2) x (5+2) element blocks centered 
on the detection position in R d and R*. These fea¬ 
tures coarsely describe changes in the morphology 
of the source between the template and search im¬ 
ages. 

The feature dif f sum is the sum of the matrix 
elements in a 5 x 5 element (2.8 x 2.8arcsec 2 ) box 
centered on the detection location in R d . It is 


lacosmic = max(B d )/max(F). (9) 

Relatively speaking, this statistic should be large 
for objects that do not vary smoothly, and small 



Fig. 3.— Difference image cutouts (left four 
columns; r_aper_psf values indicated) and cor¬ 
responding template image cutouts (right four 
columns) for objects with r_aper_psf > 1.25. 
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Table 2 —Continued 


Feature Name Importance Source 


Description 


n3sig3shift 

0.019 

New 

n3sig5shift 

0.018 

New 

n2sig3shift 

0.014 

New 

b_ref 

0.012 

B12 

gauss 

0.012 

B13 

n2sig5shift 

0.012 

New 


mag_f rom_limit 

0.010 

B12 

a_image 

0.009 

B12 

min_dist_to_edge 

0.009 

B12 

ccdid 

0.008 

B13 

flags 

0.008 

B12 

numneg 

0.007 

New 

11 

0.006 

B13 

lacosmic 

0.006 

New 

spreaderr_model 

0.006 

New 

maglim 

0.005 

B12 

bandnum 

0.004 

New 

maskfrac 

0.003 

New 


The number of matrix elements with values greater than or 
equal to 3 in the central 5x5 element block of R d minus 
the number of matrix elements with values greater than or 
equal to 3 in the central 5x5 element block of R *. 

The number of matrix elements with values greater than or 
equal to 3 in the central 7x7 element block of R d minus 
the number of matrix elements with values greater than or 
equal to 3 in the central 7x7 element block of R* 

The number of matrix elements with values greater than or 
equal to 2 in the central 5x5 element block of R d minus 
the number of matrix elements with values greater than or 
equal to 2 in the central 5x5 element block of Rf. 
Semi-minor axis of the nearest source in the galaxy coadd 
catalog, if one exists within 5”. Else imputed. 

X 2 from fitting a spherical, 2D Gaussian to a 15 x 15 pixel 
cutout around the detection on B d . 

The number of matrix elements with values greater than or 
equal to 2 in the central 7x7 element block of R d minus 
the number of matrix elements with values greater than or 
equal to 2 in the central 7x7 element block of R l . 
Limiting magnitude of the parent image from which the I d 
cutout was generated minus mag. 

Semi-major axis of object on I d from SExtractor. 
Distance in pixels to the nearest edge of the detector array 
on the parent image from which the I d cutout was gener¬ 
ated. 

The numerical ID of the CCD on which the detection was 
registered. 

Numerical representation of SExtractor extraction flags 
on I d . 

The number of negative matrix elements in a 7 x 7 element 
box centered on the detection in R d . 


signal) x£|B rf |/|£B d | 

max( B d )/ max(F), where F is the LACosmic (van Dokkum 


2001) “fine structure” image computed on B . 

Uncertainty on spread_model. 

True if there is no nearby galaxy coadd source, false other¬ 
wise. 

Numerical representation of image filter. 

The fraction of I d that is masked. 


Note. —Source column indicates the reference in which the feature was first published. B13 indicates the 
feature first appeared in Brink et al. (2013); B12 indicates the feature first appeared in Bloom et al. (2012), 
and New indicates the feature is new in this work. See [3.3 for an explanation of how feature importances 
are computed. Imputation refers to the procedure described in §3.2.4| 
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for objects that approximate a PSF. The reader 
is referred to Figure 3 of van Dokkum (2001) for 
visual examples. 


Bad columns and CCD edge effects that ap¬ 
pear as fuzzy vertical streaks near highly masked 
regions of difference images are common types of 
artifacts. Because they share a number of visual 
similarities, we designed a single feature, colmeds, 
to identify them: 


colmeds = max({med(transpose(B d )^); 


ie{0...N col -i }}), 

( 10 ) 


where N co i is the number of columns in B d . This 
feature operates on the principle that the median 
of a column in B d should be comparable to the 
background if the cutout is centered on a PSF, 
because, in general, even the column in which the 
PSF is at its greatest spatial extent in B d should 
still contain more background pixels than source 
pixels. However, for vertically oriented artifacts 
that occupy entire columns on B d , this does not 
necessarily hold. Since these artifacts frequently 
appear near masked regions of images, we define 
maskf rac as the percentage of I d that is masked. 


The feature spread_model ( Desai et al.| [2012; 


Bouy et al. 2013) is a SExtractor star/galaxy 


separation output parameter computed on the 
I d cutout. It is a normalized simplified linear 
discriminant between the best fitting local PSF 
model and a slightly more extended model made 
from the same PSF convolved with a circular ex¬ 
ponential disk model. 


3.2.4- Data Postprocessing 

When there is not a source in the galaxy coadd 
catalog within 5 arcsec of an object detected on 
a difference image, certain classification features 
cannot be computed for the object (see Table [2|. 
If the feature of an object cannot be computed, 
it is assigned the mean value of that feature from 
the training set. 

3.3. Classification Algorithm Selection 


Vapnik|1995 ), and an AdaBoost decision tree clas¬ 


sifier (Zhu et al. 


2009). We used scikit-learn 


(Pedregosa et al. 2012), an open source Python 
package for machine learning, to instantiate ex¬ 
amples of each model with standard settings. We 
performed a three-fold cross-validated comparison 
using a randomly selected 100,000-detection sub¬ 
set of the training set described in £ 3.1 The subset 


was used to avoid long training times for the SVM. 
For a description of cross validation and the met¬ 
rics used to evaluate each model, see ^4] and §4.2 
The results appear in Figure [5] We found that 
the performance of all three models was compara¬ 
ble, but that the Random Forest outperformed the 
other models by a small margin. We incorporated 
the Random Forest model into autoScan. 

Random Forests are collections of decision 
trees, or cascading sequences of feature-space unit 
tests, that are constructed from labeled train¬ 
ing data. For an introduction to decision trees, 
see Breiman et al. (1984). Random Forests can 
be used for predictive classification or regression. 
During the construction of a supervised Random 
Forest classifier, trees in the forest are trained in¬ 
dividually. To construct a single tree, the training 
algorithm first chooses a bootstrapped sample of 
the training data. The algorithm then attempts 
to recursively define a series of binary splits on 
the features of the training data that optimally 
separate the training data into their constituent 
classes. During the construction of each node, a 
random subsample of features with a user-specified 
size is selected with replacement. A fine grid of 
splits on each feature is then defined, and the split 
that maximizes the increase in the purity of the 
incident training data is chosen for the node. 

Two popular metrics for sample-purity are the 
Gini coefficient ( Gini|[l92T ) and the Shannon en¬ 
tropy (Sh annonj 1948 ). Define the purity of a sam¬ 
ple of difference image objects to b<0 


P = 


Nna 


Na + N N a 


( 11 ) 


where N^a is the number of non-artifact objects 
in the sample, and Na is the number of artifacts 


After we settled on an initial library of classi¬ 
fication features, we compared three well-known 
ML classification algorithms: a Random Forest 
(Breiman 2001), a Support Vector Machine (SVM; 


3 Some authors define P = ^— 'Dna - where Wi is the 

T,NA w i+Z^A w i' 

weight of instance i, YIa a sum over artifact events, and 
Zna a sum over non-artifact events. This renders the 
definition of the Gini coefficient in Equation |12| as Gini = 
P(l-P) 
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Fig. 4.— Contours of r_aper_psf, magdiff, and spreadjnodel—the three most important features in the 
autoScan Random Forest model, computed using the feature importance evaluation scheme described in 
£3.4—and the signal-to-noise ratio, snr. The importances of r_aper_psf, magdiff, and spread_model were 
0.148, 0.094, and 0.066, respectively. The contours show that the relationships between the features are 
highly nonlinear and better suited to machine learning techniques than hard selection cuts. 


in the sample. Note that P = 1 for a sample 
composed entirely of artifacts, P = 0 for a sample 
composed entirely of non-artifacts, and P(l—P) = 
0 for a sample composed entirely of either artifacts 
or non-artifacts. Then the Gini coefficient is 

Gini = P(1 — P)(Na + Nna)- (12) 

A tree with a Gini objective function seeks at each 
node to minimize the quantity 

Ginii c + Gini rc , (13) 

where Ginii c is the Gini coefficient of the data in¬ 
cident on the node’s left child, and Gini rc is the 
Gini coefficient of the data incident on the node’s 
right child. If Ginii c + Gini rc > Gini, then no split 
is performed and the node is declared a terminal 
node. The process proceeds identically if another 
metric is used, such as the Shannon entropy, the 
most common alternative. The Shannon entropy 
5 of a sample of difference image objects is given 
by 

S = -pNA^Og 2 (p NA ) ~p A log 2 (p A ), (14) 


where pna is the proportion of non-artifact ob¬ 
jects in the sample, and pa is the proportion of 
artifacts in the sample. 

Nodes are generated in this fashion until a max¬ 
imum depth or a user-specified measure of node 
purity is achieved. The number of trees to grow in 
the forest is left as a free parameter to be set by 
the user. Training a single Random Forest using 
the entire ~900, 000 object training sample with 
the hyperparameters selected from the grid search 
described in Table 0 took ~4.5 minutes when the 
construction of the trees was distributed across 60 
1.6GHz AMD Opteron 6262 HE processors. 

Random Forests treat the classes of unseen ob¬ 
jects as unknown parameters that are described 
probabilistically. An object to be classified de¬ 
scends each tree in the forest, beginning at the 
root nodes. Once a data point arrives at a termi¬ 
nal node, the tree returns the fraction of the train¬ 
ing instances that reached that node that were la¬ 
beled “non-artifact.” The output of the trained 
autoScan Random Forest model on a single input 
data instance is the average of the outputs of each 
tree, representing the probability that the object 
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Missed Detection Rate 


Fig. 5.— Initial comparison of the performance 
of a Random Forest, a Support Vector Machine 
with a radial basis function kernel, and an Ad- 
aBoost Decision Tree classifier on the DES-SN 
artifact/non-artifact classification task. Each clas¬ 
sifier was trained on a randomly selected 67% of 
the detections from a 100,000-detection subset of 
the training set, then tested on the remaining 33%. 
This process was repeated three times until every 
detection in the subset was used in the testing set 
once. The curves above represent the mean of each 
iteration. The closer a curve is to the origin, the 
better the classifier. The unoptimized Random 
Forest outperformed the other two methods, and 
was selected. 


is not an artifact, henceforth the “autoScan score” 
or “ML score.” Ultimately, a score of 0.5 was 
adopted as the cut r to separate real detections 
of astrophysical variability from artifacts in the 
DES-SN data; see §4.4| for details. Class predic¬ 
tion for 200,000 unseen data instances took 9.5s on 
a single 1.6GHz AMD Opteron 6262 HE processor. 


3.4. Feature Importances 


Numeric importances can be assigned to the 
features in a trained forest based on the amount 
of information they provided during training 
(Breiman et al. jl984). For each tree T in the 
forest, a tree-specific importance for feature i is 
computed according to 


C i,T = N(n)Bi(n) [m(n ) - m ch (n )\, 


neT 


(15) 


where n is an index over nodes in T, N(n) is the 
number of training data points incident on node 
n, Bi(n) is 1 if node n splits on feature i and 
0 otherwise, m(n) is the value of the objective 
function (usually the Gini coefficient or the Shan¬ 
non entropy, see j |3.3| ) applied to the the training 
data incident on node n, and m c h(n ) is the sum of 
the values of the objective function applied to the 
node’s left and right children. The global impor¬ 
tance of feature i is the average of the tree-specific 
importances: 


L 


Nrr, E^- 


( 16 ) 


where Nt is the number of trees in the forest. In 
this article, importances are normalized to sum to 
unity. 


3.5. Optimization 

The construction of a Random Forest is gov¬ 
erned by a number of free parameters called hy¬ 
perparameters. The hyperparameters of the Ran¬ 
dom Forest implementation used in this work are 
n_estimators, the number of decision trees in 
the forest, criterion, the function that measures 
the quality of a proposed split at a given tree 
node, max_f eatures, the number of features to 
randomly select when looking for the best split at 
a given tree node, max_depth, the maximum depth 
of a tree, and min_samples_split, the minimum 
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Table 3 

Grid search results for autoScan hyperparameters. 


Hyperparameter 

Values 

n_estimators 

10, 50, 100, 300 

criterion 

gini, entropy 

max_features 

5, 6 

min_samples_split 

2, 3, 4, 10, 20, 50 

max_depth 

Unlimited, 100, 30, 15, 5 


Note. —A 3-fold cross-validated search over the 
grid of Random Forest hyperparameters tabulated 
above was performed to characterize the the perfor¬ 
mance of the machine classifier. The hyperparame¬ 
ters of the best-performing classifier appear in bold. 


number of samples required to split an internal 
node. 


We performed a 3-fold cross-validated (see £4.2) 
grid search over the space of Random Forest hy¬ 
perparameters described in Table [3j A total of 
1,884 trainings were performed. The best classifier 
had 100 trees, used the Shannon entropy objective 
function, chose 6 features for each split, required 
at least 3 samples to split a node, and had unlim¬ 
ited depth, and it was incorporated into the code. 
Recursive feature elimination (Brink et al. 2013) 
was explored to improve the performance of the 
classifier, but we found that it provided no statis¬ 
tically significant performance improvement. 


4. Performance 

In this section, we describe performance of 
autoScan on a realistic classification task and the 
effect of the code on the DES-SN transient candi¬ 
date scanning load. Performance statistics for the 
classification task were measured using production 
Y1 data, whereas candidate-level effects were mea¬ 
sured using a complete reprocessing of Y1 data us¬ 
ing an updated difference imaging pipeline. The 
reprocessed detection pool differed significantly 
from its production counterpart, providing a out- 
of-sample data set for benchmarking the effects of 


the code on the scanning load|^] 

4.1. Performance Metrics 

The performance of a classifier on an n-class 
task is completely summarized by the correspond¬ 
ing n x n confusion matrix E, also known as a 
contingency table or error matrix. The matrix 
element E^ represents the number of instances 
from the task’s validation set with ground truth 
class label j that were predicted to be members of 
class i. A schematic 2x2 confusion matrix for the 
autoScan classification task is shown in Figure [6] 
From the confusion matrix, several classifier 
performance metrics can be computed. Two that 
frequently appear in the literature are the False 
Positive Rate (FPR) and the Missed Detection 
Rate (MDR; also known as the False Negative 
Rate or False Omission Rate). Using the notation 
from Figure [6j the FPR is defined by: 

FPR=— Y— , (17) 

F p + T n K ' 


4 Although the re-processing of data through the difference 
imaging pipeline from the raw images is not useful for get¬ 
ting spectra of live transients, it is quite useful for acquiring 
host-galaxy targets for previously missed transients and is 
therefore performed regularly as pipeline improvements are 
made. 
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and the missed detection rate by 


MDR = - —- . (18) 

T F K 

±p T ± n 

For autoScan, the FPR represents the fraction of 
artifacts in the validation set that are predicted 
to be legitimate detections of astrophysical vari¬ 
ability. The MDR represents the fraction of non¬ 
artifacts in the task’s validation set that are pre¬ 
dicted to be artifacts. Another useful metric is the 
efficiency or True Positive Rate (TPR), 


e = 


Tp 

Tp ± F n ’ 


(19) 


which represents the fraction of non-artifacts in 
the sample that are classified correctly. For the 
remainder of this study, we often refer to the 
candidate-level efficiency measured on fake SNe la, 
ep (see £4.4). 

Finally, the receiver operating characteristic 
(ROC) is a graphical tool for visualizing the per¬ 
formance of a classifier. It displays FPR as a func¬ 
tion of MDR, both of which are parametric func¬ 
tions of r, the autoScan score that one chooses 
to delineate the boundary between “non-artifacts” 
and “artifacts.” One can use the ROC to deter¬ 
mine the location at which the trade-off between 
the FPR and MDR is optimal for the survey at 
hand, a function of both the scanning load and 
the potential bias introduced by the classifier, then 
solve for the corresponding r. By benchmark¬ 
ing the performance of the classifier using the the 
ROC, one can paint a complete picture of its per¬ 
formance that can also serve as a statistical guar¬ 
antee on performance in production, assuming a 
validation set and a production data set that are 
identically distributed in feature space, and that 
detections are scanned individually in production 
(see £4.4). 


a “validation” set of labeled data instances that 
are not included in the training sample, and the 
union of the remaining k — 1 subsets is passed to 
the classifier as a training set. The classifier is 
trained and its predictive performance on the val¬ 
idation set is recorded. In standard /c-fold cross- 
validation, the partitioning of the original data set 
into disjoint subsets is done by drawing samples 
at random without replacement from the original 
data set. But in a stratified analysis, the draw¬ 
ing is performed subject to the constraint that the 
distribution of classes in each subset be the same 
as the distribution of classes in the original data 
set. Cross-validation is useful because it enables 
one to characterize how a classifier’s performance 
varies with respect to changes in the composition 
of training and testing data sets, helping quantify 
and control “generalization error.” 

4.3. Results 

Figure [7] shows the ROCs that resulted from 
each round of cross-validation. We report that 
autoScan achieved an average detection-level 
MDR of 4.0 ±0.1 percent at a fixed FPR of 
2.5 percent with r = 0.5, which was ultimately 
adopted in the survey; see §4.4| We found that 
autoScan scores were correlated with detection 
signal-to-noise ratio (S/N). Figure [8] displays the 
fake efficiency and false positive of autoScan using 
all out-of-sample detections of fake SNe from each 
round of cross-validation. At S/N < 10, the out- 
of-sample fake efficiency is markedly lower than it 
is at higher S/N. The efficiency asymptotically ap¬ 
proaches unity for S/N > 100. The effect becomes 
more pronounced when the class discrimination 
boundary is raised. This occurs because legiti¬ 
mate detections of astrophysical variability at low 
S/N are similar to artifacts. The false positive 
rate remains relatively constant in the S/N < 10 
regime, where the vast majority of artifacts reside. 


4.2. Classification Task 

We used stratified 5-fold cross-validation to test 
the performance of autoScan. Cross validation is 
a technique for assessing how the results of a sta¬ 
tistical analysis will generalize to an independent 
data set. In a /c-fold cross-validated analysis, a 
data set is partitioned into k disjoint subsets, k 
iterations of training and testing are performed. 
During the ith iteration, subset i is held out as 


4.4. Effect of autoScan on Transient Can¬ 
didate Scanning Load 

As discussed in £[2j DES-SN performs target se¬ 
lection and scanning using aggregates of spatially 
coincident detections from multiple nights and fil¬ 
ters (“candidates”). After the implementation of 
autoScan, the NUMEP0CHS requirement described 
in Table [l] was revised to require that a candidate 
be detected on at least two distinct nights having 
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Fig. 8.— Object-level fake efficiency and false positive rate as a function of S/N, at several autoScan score 
cuts. The S/N is computed by dividing the flux from a PSF-model fit to a 35 x 35 pixel cutout around the 
object in the difference image by the uncertainty from the fit. The artifact rejection efficiency and missed 
detection rate are 1 minus the false positive rate and fake efficiency, respectively. The fake efficiency of 
autoScan degrades at low S/N, whereas the false positive rate is relatively constant in the S/N regime not 
dominated by small number statistics, r = 0.5 (bold) was adopted in DES-SN. 
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Fig. 6.— Schematic confusion matrix for the 
autoScan classification task. Each matrix element 
E ij represents the number of instances from the 
task’s validation set with ground truth class label 
j that were predicted to be members of class i. 



Fig. 7.— 5-fold cross-validated receiver operat¬ 
ing characteristics of the best-performing classifier 
from §3.5| Six visually indistinguishable curves are 
plotted: one translucent curve for each round of 
cross-validation, and one opaque curve represent¬ 
ing the mean. Points on the mean ROC corre¬ 
sponding to different class discrimination bound¬ 
aries r are labeled, r = 0.5 was adopted in DES- 
SN. 
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at least one detection with an ML score greater 
than r to become eligible for visual scanning. In 
this section we describe the effect of this revision 
on the scanning load for an entire observing season 
using a full reprocessing of the Y1 data. 

We sought to minimize the size of our transient 
candidate scanning load with no more than a 1 
percent loss in ep. By performing a grid search on 
t, we found that we were able to reduce the num¬ 
ber of candidates during the first observing season 
of DES-SN by a factor of 13.4, while maintain¬ 
ing ep > 99.0 per cent by adopting r = 0.5. Af¬ 
ter implementing autoScan using this r, we mea¬ 
sured the quantity (Na/Nna), the average ratio 
of artifact objects to non-artifact detections that a 
human scanner encountered during a scanning ses¬ 
sion, using random samples of 3,000 objects drawn 
from the pool of objects passing the modified and 
unmodified cuts in Table |TJ We found that the 
ratio decreased by a factor of roughly 40 after the 
production implementation of autoScan. Table [4] 
summarizes these results. 

5. Discussion 

With the development of autoScan and the use 
of fake overlays to robustly measure efficiencies, 
the goal of automating artifact rejection on dif¬ 
ference images using supervised ML classification 
has reached a certain level of maturity. With sev¬ 
eral historical and ongoing time-domain surveys 
using ML techniques for candidate selection, it 
is clear that the approach has been successful in 
improving astrophysical source selection efficiency 
on images. However, there are still several ways 
the process could be improved for large-scale tran¬ 
sient searches of the future, especially for ZTF and 
LSST, whose demands for reliability, consistency, 
and transparency will eclipse those of contempo¬ 
rary surveys. 

5.1. Automating Artifact Rejection in Fu¬ 
ture Surveys 

For surveys like LSST and ZTF, small decreases 
in MDR are equivalent to the recovery of vast 
numbers of new and interesting transients. De¬ 
creasing the size of the feature set and increas¬ 
ing the importance of each feature is one of the 
most direct routes to decreasing MDR. However, 
designing and engineering effective classification 


features is among the most time-consuming and 
least intuitive aspects of framework design. Im¬ 
proving MDR by revising feature sets is a mat¬ 
ter of trial and error—occasionally, performance 
improvements can result, but sometimes adding 
features can degrade the performance of a classi¬ 
fier. Ideally, surveys that will retrain their classi¬ 
fiers periodically will have a rigorous, determinis¬ 
tic procedure to extract the optimal feature set 
from a given training data set. This is possi¬ 
ble with the use of convolutional neural networks 
(CNNs), a subclass of Artificial Neural Networks, 
that can take images as input and infer an optimal 
set of features for a given set of training data. The 
downside to CNNs is that the resulting features 
are significantly more abstract than astrophysi- 
cally motivated features and consequently can be 
more difficult to interpret, especially in compari¬ 
son with Random Forests, which assign each fea¬ 
ture a relative importance. However, CNNs have 
achieved high levels of performance for a diverse 
array of problems. They remain relatively unex¬ 
plored in the context of astrophysical data pro¬ 
cessing, and bear examination for use in future 
surveys. 


Next, unless great care is taken to produce a 
training data set that is drawn from the same mul¬ 
tidimensional feature distribution as the testing 
data, dense regions of testing space might be com¬ 
pletely devoid of training data, leading to an un¬ 
acceptable degradation of classification accuracy 
in production. Developing a rigorous method for 
avoiding such sample selection bias is crucial for 
future surveys, for which small biases in the train¬ 
ing set can result in meaningful losses in efficiency. 
The idea of incorporating active learning tech¬ 
niques into astronomical ML classification frame¬ 
works has been advanced as a technique for reduc¬ 
ing sample selection bias (Ric hards et al.||2012 ). 


Given a testing set and a training set which 
are free to be drawn from different distributions 
in feature space, in the pool-based active learn¬ 
ing for classification framework, an algorithm it¬ 
eratively selects, out of the entire set of unlabeled 
data, the object (or set of objects) that would give 
the maximum performance gains for the classifi¬ 
cation model, if its true label were known. The 
algorithm then solicits a user to manually input 
the class of the object under consideration, and 
then the object is automatically incorporated into 
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Table 4 

Effect of autoScan on Reprocessed DES Y1 Transient Candidate Scanning Load. 



No ML 

ML (r = 0.5) 

ML / No ML 

N c a 

100,450 

7,489 

0.075 

(N A /N NA ) h 

13 

0.34 

0.027 

e F c 

1.0 

0.990 

0.990 


a Total number of science candidates discovered. 

b Average ratio of artifact to non-artifact detections in 
human scanning pool determined from scanning 3,000 ran¬ 
domly selected detections from all science candidate detec¬ 
tions. 

c autoScan candidate-level efficiency for fake SNe la. 



Fig. 9.— 24 consecutively observed difference image cutouts of a poorly subtracted galaxy that was wrongly 
identified as a transient. The autoScan score of each detection appears at the bottom of each cutout. The 
mis-identffication occurred because on two nights the candidate had a detection that received a score above an 
autoScan class discrimination boundary r = 0.4 used during early code tests (green boxes). Night-to-night 
variations in observing conditions, data reduction, and image subtraction can cause detections of artifacts to 
appear real. If a two-night trigger is used, spurious “transients” like this one can can easily accumulate as 
a season goes on. Consequently, care must be taken when using an artifact rejection framework that scores 
individual detections to make statements about aggregates of detections. Each image is labeled with the 
observation date and filter for the image, in the format YYYYMMDD-filter. 
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future training sets to improve upon the original 
classifier. Under this paradigm, human scanners 
would play the valuable role of helping the classi¬ 
fier learn from its mistakes, and each human hour 
spent vetting data would immediately carry sci¬ 
entific return. Active learning could produce ex¬ 
tremely powerful classifiers over short timescales 
when used in concert with generative models for 
training data. Instead of relying on historical data 
to train artifact rejection algorithms during com¬ 
missioning phases, experiments like LSST could 
use generative models for survey observations to 
simulate new data sets. After training a classifier 
using simulated data, in production active learn¬ 
ing could be used to automatically fill in gaps in 
classifier knowledge and augment predictive accu¬ 
racy. 


In this work, we used a generative model of 
SN la observations—overlaying fake SNe la onto 
real host galaxies—to produce the “Non-Artifact” 
component of our training data set. However, the 
nearly 500,000 artifacts in our training set were 
human-scanned, implying that future surveys will 
still need to do a great deal of scanning before 
being able to get an ML classifier off the ground. 
A new survey should not intentionally alter the 
pipeline to produce artifacts during commission¬ 
ing, as it is crucial that the unseen data be drawn 
from the same feature distributions as the training 


data. For surveys with (Na/Nna) > 100, Brink 


et al. (2013) showed that a robust artifact library 


can be prepared by randomly sampling from all 
detections of variability produced by the differ¬ 
ence imaging pipeline. For surveys or pipelines 
that do not produce as many artifacts, some ini¬ 
tial scanning to produce a few 10 4 -artifact library 
from commissioning data should be sufficient to 
produce an initial training set (Brin k et al.||20 13 
du Bnisson et al.|[20 14). 


5.2. Eliminating Spurious Candidates 

Using a two-night trigger, some spurious science 
candidates can be created due to nightly variations 
in astrometry, observing conditions, and repeat¬ 
edly imaged source brightnesses that cause night- 
to-night fluctuations in the appearance of candi¬ 
dates on difference images. These variations lead 
to a spread of ML scores for a given candidate. As 
an observing season progress, artifacts can accu¬ 
mulate large numbers of detections via repeated 


visits. Although for a typical artifact the vast ma¬ 
jority of detections fail the ML requirement, the 
fluctuations in ML scores can cause a small frac¬ 
tion of the detections to satisfy the autoScan re¬ 
quirement. Figure [9] shows an example of this ef¬ 
fect. 

Mitigating the buildup of spurious multi-night 
candidates could be achieved by implementing 
a second ML classification framework that takes 
as input multi-night information, including the 
detection-level output of autoScan, to predict 
whether a given science candidate represents a 
bona-fide astrophysical source. Training data 
compilation could be performed by randomly se¬ 
lecting time-contiguous strings of detections from 
known candidates. The lengths of the strings 
could be drawn from a distribution specified dur¬ 
ing framework development. Candidate-level fea¬ 
tures could characterize the temporal variation 
of detection level features, such as the highest 
and lowest night-to-night shifts in autoScan score, 
magnitude, and astrometric uncertainty. 
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